These are my personal notes from these sessions, so they may be missing or spotty. If you have other notes from EA Global 2016, please do share them in a Google Doc or something. Here's a
shared Google Doc to use if you'd like.
How To Change Your Mind
Actual beliefs (
aliefs) what people actually believe inside, not what they say they believe
- Look at your lanyard. Hold your lanyard, then let it go. Then, form the intention to not hold the lanyard, and then hold it. There should be a tension there.
- Have the intention to tell the truth, and then say something that's false. That should feel similarly weird.
- Then, try saying true with the intention to lie. There should be the same tension.
- This tension should be absent when you are saying something true (to your beliefs) when holding the intention to tell the truth.
- Mismatch between intention and content
- 'Smoothly' reported things are likely cached answers, not real belief reporting.
Think of something you feel like you should believe but don't. Then, hold the intention and do this:
- X is bad because
- X is good because
Based on your reasoning, check if that reasoning gets your intellectual endorsement.
Iterate until you don't want to change the belief, then you are done.
If you don't like how it sounds, then query and justify why it sounds weird. How do you know?
Reaching a self-model (vs. a world-model): you might reach a statement 'I read Reddit because: I am lazy' which is a self-model (about yourself or how you interfaces with X). That doesn't help: we need to report on X itself. (However, the instructor also used self-models in examples, so maybe they're not all bad. She mentioned it's useful to query the self-models themselves to analyze them.)
One strategy is to translate them into what will happen: 'If I X / don't X, then Y'
Epistemics or virtue may prevent belief reporting, e.g. convincing yourself that you need to believe a certain way. In that case, belief report on whether it's bad to belief report on that thing.
If you end up at a point where you're not sure your model is correct, you can induce skepticism: what's the most plausible way this could be false? Find ways and evidence that it's false.
Some parts of your mind may have information other parts of your mind don't have (or haven't propagated). How can we propagate that? [Accessing information.] Example: one part of mind might think Reddit is bad, another might know it's a good de-stresser.
Belief reporting out loud is good since you get physical feedback [another dimension of feedback?]
If you reach two competing models, ask 'what would things look like if A were true, what would things look like if B were true'
Eugene Gendlin's book Focusing is recommended
Navigating Intellectual Disagreement
Double crux is a method from CFAR to identify useful things to focus on when navigating intellectual disagreement.
Let's say you and someone else has a disagreement. As an example, Alice wants the top priority for an organization to be fundraising, and Bob wants it to not be fundraising.
Each person lists out the things that would change their mind. "What would change my mind?" This is better than the usual thinking of "What would change their mind?" because you don't have a good understanding of how the other person thinks. As a result, you really suck at thinking about "what would change their mind" and you're pretty good at thinking "what would change my mind", so each party specializes to what they're best at and exchanges the results.
Alice — fundraising should be top priority — would change mind if 1) funding was easy to come by and doesn't need to be the main focus or 2) if funding wasn't a bottleneck for the organization.
Bob — fundraising should not be top priority — would change mind if 1) team would disband because of lack of funds or 2) if funding was a bottleneck for the organization.
2) is shared between both sets, and is a
Now, go into that double crux and find information about it. You've now equated your starting question with this double crux, and can have a shared value that may result in one of your decisions being changed.
Sidenote: when people list the reasons they believe in something, they often use disjunctive reasons (I believe in this because 1, 2, 3, where all reasons are "OR" reasons, that is, even if 1 was false, they would still believe the thing). But "what would change my mind" is conjunctive (I would change my mind if 1, 2, 3, if any of those are true, then you would change your mind.)
Personal notes: this is interesting since it seems like in certain cases it could be used to convert more subjective positions ("fundraising should be top priority") to objective or slightly more objective variables ("funding is bottleneck") which could be decided by data (or at least better decided with data than the original question); maybe we can think of this as a subjective-to-objective conversion. Since we equate the main problem with a double crux, this also could be a process to find root causes, though I don't know whether root causes are necessarily double cruxes.)
Planning Under Uncertainty
Most of what was covered in the workshop is
1) break goals into manageable pieces
2) take steps and keep moving while updating your goal
Outcome: open questions
- Goal vs home run
- First order vs later order
- Instrumental vs terminal goal
- Hard sometimes to get people up to speed
EA College Chapters
Maximizing path changes - maybe maximizing difficulty of the project
Finding the most talented college students + causing them to prioritize and improve the world
While you're in college, meeting people who will become influential
- Groups lack vision: shared emotional belief for why they exist or why working on the group is valuable - shared vision
- Hard for student groups to be the vehicle to deliver what's most valuable with EA. most social groups are about social connection or prestige - EA offers good content, but content isnt structured correctly
- EA groups have to target the most dedicated people who will put in a lot of time - groups don't target the right recruits
- Giving Game - getting people to discuss the difference between charities and the merits - why did you choose this charity? - I'm with this group, we're interested in discussing this stuff, etc.
- For people who seem particularly interested, if they don't show up to events, email them and ask to get coffee - has worked for EA Duke
- Go to the Pitching EA workshop
4 public intro events
- Intro to effective altruism
- Speaker event to attract people
- 2 events to present people with intellectual content that they definitely haven't encountered before – find new strange intellectual content they might not have considered before – e.g. public reading of Nick Bostrom's Fable of the Dragon-Tyrant
Two intro discussion groups - finding people from larger events
Online marketing - having a website Facebook and posting events through there
Networking - events on campus that will be filtering for ambitious people - gauge events like that and find interesting people in that world
Probabilistic Decision Making
- List benefits
- List costs - present in everything e.g. environmental impact, replacement
- Estimate impact
- Calculate impact
Use 90% confidence interval of e.g. 10 to 60
How To Measure Anything
Distributons: lognormal, normal
Use software tools like
Use analysis to see which things are the most uncertain that you need to find more information about
Meta Expected Value Calculator to see whether you should be making an expected value estimate
(Rest of workshop was hands-on on using Guesstimate)
Existential Risk Careers Beyond Research
Classes of career options:
- Influencing policy and key actors
- Steering resources
Isn't this stuff (e.g. operations) replaceable? Not really, it seems – having deep knowledge about XRisk strategy is important in doing this work e.g. Kyle, who has worked as Bostrom's executive assistant; with his knowledge about strategy, can pick out the important events, etc. to work on
Neglected because it's not as sexy
- Talented generalists, e.g. Malo (CTO, MIRI), Tara (CEA)
- Executive assistants, e.g. Kyle (Executive Assistant to Nick Bostrom)
- Fundraising steering, e.g. in academia, where there is money that could be steered toward AI safety research
- Center for Existential Risk hiring project management
- Project management
- Money and talent (e.g. 80,000 Hours)
- Steering resources can have a higher impact than earning-to-give
- Deep expertise and connections in the field you're working in
- Policymakers also tend to listen to people who have ability to direct resources
- Jason (IARPA, 30+, PhD in relevant subject), hiring managers
Influencing policy and key actors
- FHI travels to DeepMind each week and talks to them about XRisk
- Military, intelligence, civil service, AI companies, foreign governments, etc.
- Go-to person in DC for AI?
- Become the go-to person in a policy area to steer that on behalf of some entity
- Get people into positions where they can influence things related to existential risk
Developing policy expertise
- Developing a deep, solid understanding of an area; we lack this in EA
- Talk to Richard Parr; Genya Dana (Senior Science Policy Officer, U.S. Department of State)
We need more people who have wide ranges of experience in XRisk, not just software engineering
- DeepMind policy team (policy + specialist AI knowledge)
- If particularly talented in research
- Specializing too early vs. generating generalizable career capital
- More general resource-building (e.g. movement-building)
Think about these options and also what information you would need to make a decision on this stuff and how would you go about that
Using Machine Learning to Address AI Risk
Future AI systems might be similar to present-day
AGI may be developed soon
Task: semi-concrete objective in the world
vs. learning human values and doing things that humans would consider good – not really a 'task'
Thinking is that task-directed (not value-directed) AI could be sufficient to prevent global catastropic risks
Moderate human assistance to evaluate/carry out plan; should not require much more resources
Modeling future AI systems
Current systems: imagine more powerful versions of them [assumption is linear]
1. Actions are hard to evaluate
Human evaluation of a plan [puts in human values] [do humans have granular enough scoring?]
Let's say a RL program writes a story that the human rates
- Manipulation to make the human think story is good (if system is more intelligent)
- Plagiarism (even if system is less intelligent) (asymmetry of making vs. detecting plagiarism)
- Steganography (polynomial vs. exponential time)
RQ: Informed oversight (another RL?) to help an evaluator?
(Question from Zach Schlosser: how isn't this an infinite regression? Jessica: this oversight system doesn't need to have any trust, it just has to do an objective job like 'comparing two pieces for plagiarism')
2. Ambiguous test examples
If the training set is constrained such that it doesn't contain 'ambiguous' elements (e.g. it contains domesticated cats, but not wild cats)
RQ: Can we have an algorithm tell us when a classification is ambiguous?
3. Difficulty imitating human behavior
Produce the kind of picture [or other product] that a human would draw [or make].
One approach is
generative adversarial models: imitator produces the image that a distinguisher would classify as a human, and distinguisher is trying to tell whether it is a human or imitator. An automated turing test
Distinguisher smarter than imitator? Imitator could make undetectable changes to an image that the distinguisher couldn't detect [differences in relative intelligence impact]
RQ: How can w design and train ML systems to effectively imitate humans engaged in complex/difficult tasks?
4. Difficulty specifying goals about the real world
How to train an AI to make a sandwich?
Agent shuld choose actions that lead to a reward (positive check)
Reinforcement learning could potentially approve or reward its own work?
RQ: Generalizable environmental goals: pursue goals described by the environment, not by the goals?
5. Negative side effects
The agent wants to choose situations where the sandwich is in the room, so it may interfere with external factors, e.g. the 0.001% chance that humans will enter the room
RQ: Impact measures: Can we quantify these effects? Designing an AI system to avoid plans with high estimated impact [thresholds of impact]
RQ: Mild optimization: Designing systems that pursue their own goals 'without trying too hard' – stopping when the goal is 'good enough' at 99.99%?
RQ: Averting instrumental incentives: Removing these instrumental goals, or default incentives that can be manipulated?
(Question from MB: How do we do impact assessment? (And is it the same agent?) A: Comparing probability distributions of AI on (pi) vs. AI off (null)) [are there automated ways to do impact assessment?]
6. Edge cases that still satisfy the goal
Edge cases with human concept of a 'sandwich' – or concepts in general
e.g. with edge cases that trick image classifiers
RQ: Can we train systems that don't include these edge cases?
Technical depth: Inductive ambiguity problem
Generate models based on existing data; notice that the new example is labels it different things based on different models
- A learning system is given an ambiguous case with a true answer (unknown to learner)
- It can either:
- Output answer, but it has to be within epsilon of true answer, and if it's not it loses
- Output bottom, in which case it can observe true answer
- Objective: don't lose, and don't output bottom too much
Only works for simple hypothesis classes, and cases for which there are true answers
With a prior
Q, with a true prior P, perform classification as well as if we already knew P
Grain of truth assumption: for all mapping f, Q(F) ≥ 1/k P(f); at least as good as P? You can split up probability distribution to pieces, one of which is right, and the system should know which one is right.
Other research agendas
The Technological Singularity: Managing the Journey in 2017)
- MIRI: Theoretical foundations of AI systems (agnostic to the form it takes)
- Concrete problems in AI systems: empirically in current machine learning problems, like making RL agents that act safely as they explore the environment
Advanced Career Planning
Missing the first 15 mins; ask 60K folks for slides.
How important is flexibility?
- Estimate probability that you're wrong which problem is most pressing within ten years
- Estimate probability that a new top problem is discovered within ten years
- How much better could these new problems be?
To maximize flexibility: earning to give... transferrable skills... general success (connections) ... building the EA community
Should you build career capital or try to have impact right away?
Age of peak output in different fields (see 60K page on this)
Average of around 30-50 is when you're most productive (20 to 30 years into your career)
Probably should build career capital
- Problem area that is unusually urgent, e.g. EA community
- Problems with a deadline: e.g. AI risk
Career capital is credentials, skills, connections, character, runway, multiple dimensions to consider beyond the kind of credentialing that McKinsey, etc. provide. They're 8/10s; we want 10/10s
Outstanding career capital
- Impressive social impact achievements
- Even striving for these can lead to good results in challenging oneself
- Be able to meet high performing people
- Stand out more than credentials
- Power (political cxns), money (cxns to rich people), fame
- Cutting-edge expertise
How to get it:
- Do what you excel at (personal fit)
- Do what's important (short-term impact)
- Position yourself for the most influential long-term options (long-term impact)
- Or, do important socially-impactful stuff can put you in a good position to do stuff in the long run
- Impressive achievements
- Meet people who care deeply about impact (caring about doing good)
What are your best options for outstanding long-term career capital?
- Working on projects that can show impact on AI?
- Working on more philosophical stuff, or maybe ability to express stuff?
How much risk to take?
- Personally speaking: be risk-averse, because of diminshing returns
- Altruistically speaking: aim to be close to risk neutral
- Expected value = probablity of success x value of outcome
(Brian Tomasik on risk)
Be open to high-risk, high-reward situations
Are high-risk opportunities moe effective in practice? Theory: non-altruistic people are risk-averse; so high risk opportunities are less crowded; so high risk opportunities have higher expected value per dollar.
Theory isn't really true: people are overconfident, causing crowding of high-risk areas. As a result, one should estimate the expected value
Biomedical research: NIH rewards safe-bet research (not high-risk research); if you do high-risk high-impact work, you might be able to do more impactful research
Regress your estimates to the mean: high-risk high-reward things that have high EV probably are mistaken
Managing risk in your career
- Understanding the risk
- Personal: Will you permanently reduce your happiness?
- Altruistic: Will you permanently damage your career capital?
- What's the worst realistic downside?
- Increase ability to take risk
- Successful entrepreneurs might actually be more risk-averse
- Mitigate risk with a plan Z
- Build 12-24 months of runway
- Portfolio career: more risk earlier, less risk later
- Personal priorities covered with low risk stuff, then take high risk bets with a higher EV
Foundational Research Institute: Suffering Focused AI Safety
Brian Tomasik (NYC)
Can a wake-up call be too loud, e.g. Three Mile Island, such that people don't go for nuclear in the future?
Suffering-focused: reducing suffering; motivations: diminishing returns to utopia (suffering doesn't have these diminishing returns)
AI-related issues: Suffering subroutines, ancestor simulators, warfare / space exploration; nearly-controlled AI (miserable creatures, black swan events); human-controlled AI (bad values, lack of compromise)
Suffering-focused AI safety
Large space of AI outcomes: large area of suffering, large area of no value, small area of very good outcomes. How do we get out of the space of very bad outcomes (instead of trying to get into the small area of very good outcomes?)
People disagree about what kind of 'utopia' they would want to create
General research (e.g. MIRI) who are doing work that can be applied to many approaches
- Human-controlled AI: values, spreading, promoting compromise
- Near misses: corrigbility, backup utility functions, black swan events (decision theory, priors, ontology, etc.)
- Uncontrolled AI: sovle safety for the riskiest AI designs, dummy goals
Backup utility functions
- Backup utility functions: different specifications of values
- Precise, e.g. CEV; hard to get to work/implement
- Not quite what most people want, but more easy to implement, without big danger for really bad outcomes, e.g. careful utilitarianism
- Idea: use CEV but switch to careful utilitarianism if CEV fails
- Thought experiments as texts: controlled thought experiments where you find the utility of A and Not A and compare them
- Identify ethically relevant aspects of outcomes and assume CEV/utility correlates with it (correlation of utility and suffering)
- Goals that lead to comparatively benign failures in case of unexpected takeoff
- Don't want to test goals that could lead to bad outcomes
Risks and Benefits of Advanced AI
Mostly personal notes; see the full video.
"Everything we love about civlization is a product of intelligence"
The concern with AI is not malice: it's competence. Ants are concerned about you not because you're an ant-hater, but you're more competent and your goal might not be aligned with it (water projects, e.g.)
Ord — What's our comparative advantage in starting early on this problem?
Amodei — Even if advanced AI is far away, the things that can happen in the next 5–10 years should give us pause. Current research work deployed in 5–10 years can have a big impact. [Are there small cases where we see AI going wrong? Maybe this can be convincing?]
Ord — Humans' ability is based on their advantage of intelligence. But if we develop smarter than us, we're pretty much removing our only advantage over other systems. We need to control them. [Methods of control]
Amodei — DeepDream was trying to give transparency into neural nets. Looking into some class of images labeled 'barbell', noticing that a hand was attached to it – a correlation that could do something vey unpredictable.
George — Solving the AI safety problem is a part of solving the AI problem: for AI to work correctly, it needs to be able to understand safety.
Amodei — Technical: AI or safety: have strong background in machine learning. Google, DeepMind, and OpenAI are hiring for machine learning research engineers: collaborating with research scientists to implement and scale up AI ideas
Lessons from Convergence Models
A mathematical model
Even with intent to do good with e.g. virology, it may make it easier to weaponize
Assessing relative risk: research safety ratios
Estimated value of good outcome / estimated value of bad outcome
Way easier to develop AGI than FAI. The ratio here is good goal / toxic goal.
Different research has different ratios of value of AGI research vs. FAI research. If we use these ratios, even if something has better impact for FAI than AGI, it's not good enough. Like if the ratio is 6 but the need for FAI over AGI (f-global) is 10, still not good enough.
Difficulty is estimations and they're trying to do things, like testing how good people are in estimating, and also testing against the past
A monitor and a metric for friendliness